Inspired by the success of contrastive learning (CL) in computer vision and natural language processing, graph contrastive learning (GCL) has been developed to learn discriminative node representations on graph datasets. However, the development of GCL on Heterogeneous Information Networks (HINs) is still in the infant stage. For example, it is unclear how to augment the HINs without substantially altering the underlying semantics, and how to design the contrastive objective to fully capture the rich semantics. Moreover, early investigations demonstrate that CL suffers from sampling bias, whereas conventional debiasing techniques are empirically shown to be inadequate for GCL. How to mitigate the sampling bias for heterogeneous GCL is another important problem. To address the aforementioned challenges, we propose a novel Heterogeneous Graph Contrastive Multi-view Learning (HGCML) model. In particular, we use metapaths as the augmentation to generate multiple subgraphs as multi-views, and propose a contrastive objective to maximize the mutual information between any pairs of metapath-induced views. To alleviate the sampling bias, we further propose a positive sampling strategy to explicitly select positives for each node via jointly considering semantic and structural information preserved on each metapath view. Extensive experiments demonstrate HGCML consistently outperforms state-of-the-art baselines on five real-world benchmark datasets.
translated by 谷歌翻译
We present a novel method for local image feature matching. Instead of performing image feature detection, description, and matching sequentially, we propose to first establish pixel-wise dense matches at a coarse level and later refine the good matches at a fine level. In contrast to dense methods that use a cost volume to search correspondences, we use self and cross attention layers in Transformer to obtain feature descriptors that are conditioned on both images. The global receptive field provided by Transformer enables our method to produce dense matches in low-texture areas, where feature detectors usually struggle to produce repeatable interest points. The experiments on indoor and outdoor datasets show that LoFTR outperforms state-of-the-art methods by a large margin. LoFTR also ranks first on two public benchmarks of visual localization among the published methods. Code is available at our project page: https://zju3dv.github.io/loftr/.
translated by 谷歌翻译
最近,分布式的半监督学习(DSSL)算法表明,它们在利用未标记的样本优于互连网络方面的有效性,在这些网络上,代理无法彼此共享其原始数据,并且只能与邻居传达非敏感信息。但是,现有的DSSL算法无法应对数据不确定性,并且可能会遭受高度计算和通信开销问题的困扰。为了解决这些问题,我们提出了一个分布式的半监督模糊回归(DSFR)模型,该模型具有模糊的规则和插值一致性正则化(ICR)。 ICR最近是针对半监督问题的,可以迫使决策边界通过稀疏的数据区域,从而增加模型的鲁棒性。但是,尚未考虑其在分布式方案中的应用。在这项工作中,我们提出了分布式模糊C均值(DFCM)方法和分布式插值一致性正则化(DICR)(DICR)构建在众所周知的乘数交替方向方法上,以分别定位DSFR的先行和结果组件中的参数。值得注意的是,DSFR模型的收敛非常快,因为它不涉及后传播过程,并且可扩展到从DFCM和DICR的利用率中受益的大规模数据集。人工和现实世界数据集的实验结果表明,就损失价值和计算成本而言,提出的DSFR模型可以比最新的DSSL算法获得更好的性能。
translated by 谷歌翻译
预计未来的无线网络将支持各种移动服务,包括人工智能(AI)服务和无处不在的数据传输。联合学习(FL)作为一种革命性的学习方法,可以跨分布式移动边缘设备进行协作AI模型培训。通过利用多访问通道的叠加属性,无线计算允许同时通过同一无线电资源从大型设备上传,因此大大降低了FL的通信成本。在本文中,我们研究了移动边缘网络中的无线信息和传统信息传输(IT)的共存。我们提出了一个共存的联合学习和信息传输(CFLIT)通信框架,其中FL和IT设备在OFDM系统中共享无线频谱。在此框架下,我们旨在通过优化长期无线电资源分配来最大化IT数据速率并确保给定的FL收敛性能。限制共存系统频谱效率的主要挑战在于,由于服务器和边缘设备之间的频繁通信以进行FL模型聚合,因此发生的大开销。为了应对挑战,我们严格地分析了计算与通信比对无线褪色通道中无线FL融合的影响。该分析揭示了存在最佳计算与通信比率的存在,该比率最大程度地降低了空中FL所需的无线电资源量,以收敛到给定的错误公差。基于分析,我们提出了一种低复杂性在线算法,以共同优化FL设备和IT设备的无线电资源分配。广泛的数值模拟验证了FL和IT设备在无线蜂窝系统中共存的拟议设计的出色性能。
translated by 谷歌翻译
Federated Edge Learning(Feel)已成为一种革命性的范式,可以在6G无线网络的边缘开发AI服务,因为它支持大量移动设备的协作模型培训。但是,无线通道上的模型通信,尤其是在上行链路模型上传的感觉中,已被广泛认为是一种严重限制感觉效率的瓶颈。尽管无线计算可以减轻广播资源在感觉上传中的过度成本,但无线空中感觉的实际实施仍然遭受了一些挑战,包括强烈的Straggler问题,大型沟通开销和潜在的隐私泄漏。在本文中,我们研究了这些挑战,并利用了未来无线系统的关键推动力,以应对这些挑战。我们研究了有关RIS授权的感觉的最新解决方案,并探索采用RIS增强感觉性能的有希望的研究机会。
translated by 谷歌翻译
联合学习(FL)最近被揭示为有希望的技术,以便在网络边缘启用人工智能(AI),其中分布式移动设备在边缘服务器的协调下协同培训共享AI模型。为了显着提高FL的通信效率,通过利用无线多接入信道的叠加特性,遍布空中计算允许大量的移动设备通过利用无线多接入信道的叠加特性同时上传其本地模型。由于无线信道衰落,边缘服务器的模型聚合误差由所有设备中最弱的通道主导,导致严重的孤立问题。在本文中,我们提出了一种继电器协助的合作液计划,以有效地解决了斯塔格勒问题。特别是,我们部署了多个半双工继电器以协同协作在将本地模型更新上载到边缘服务器时的设备。空中计算的性质构成了与传统继电器通信系统中不同的系统目标和约束。此外,设计变量之间的强耦合使得这种系统具有挑战性的优化。为了解决问题,我们提出了一种基于交替优化的算法来优化收发器和中继操作,具有低复杂度。然后,我们在单个中继盒中分析模型聚合误差,并显示我们的继电器辅助方案实现比没有继电器的中继的误差较小的误差。该分析提供了对协同媒体实施中的继电器部署的关键见解。广泛的数值结果表明,与最先进的方案相比,我们的设计达到了更快的融合。
translated by 谷歌翻译
动态纹理(DT)在时间维度中表现出空间结构域和随机重复性的统计平稳性,表明DT的不同帧具有高度相似性相关性,这是关键的先验知识。但是,现有方法无法有效地从少数培训数据中学习有希望的高维DT合成模型。在本文中,我们提出了一种新颖的DT合成方法,该方法充分利用了先验知识来解决此问题。我们的方法基于提出的内核相似性嵌入,这不仅可以减轻高维度和小样本问题,而且还具有建模非线性特征关系的优势。具体而言,我们首先提出了两个假设,这些假设对于DT模型使用相似性相关性生成新框架至关重要。然后,我们将内核学习和极端学习机集成到统一的合成模型中,以学习代表DT的内核相似性。从Internet和两个基准数据集(即Gatech GraphCut纹理和Dyntex)收集的DT视频的广泛实验表明,学到的内核相似性嵌入可以有效地显示出DT的歧视性表示。因此,我们的方法能够保留具有出色的可持续性和概括的合成DT序列的长期时间连续性。同时,与最先进的方法相比,它有效地生成了具有快速和低计算的现实DT视频。代码和更多综合视频可在我们的项目页面https://shiming-chen.github.io/similarity-page/similarit.html上获得。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Knowledge graphs (KG) have served as the key component of various natural language processing applications. Commonsense knowledge graphs (CKG) are a special type of KG, where entities and relations are composed of free-form text. However, previous works in KG completion and CKG completion suffer from long-tail relations and newly-added relations which do not have many know triples for training. In light of this, few-shot KG completion (FKGC), which requires the strengths of graph representation learning and few-shot learning, has been proposed to challenge the problem of limited annotated data. In this paper, we comprehensively survey previous attempts on such tasks in the form of a series of methods and applications. Specifically, we first introduce FKGC challenges, commonly used KGs, and CKGs. Then we systematically categorize and summarize existing works in terms of the type of KGs and the methods. Finally, we present applications of FKGC models on prediction tasks in different areas and share our thoughts on future research directions of FKGC.
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译